Current trends in high dimensional massive data analysis
نویسندگان
چکیده
منابع مشابه
Methods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملComparing Massive High-Dimensional Data Sets
The comparison of two data sets can reveal a great deal of information about the time-varying nature of an observed process. For example, suppose that the points in a data set represent a customer’s activity by their location in n-dimensional space. A comparison of the distribution of points in two such data se.ts can indicate how the customer activity has changed between the observation period...
متن کاملIdentifying Information-Rich Subspace Trends in High-Dimensional Data
Identifying information-rich subsets in high-dimensional spaces and representing them as order revealing patterns (or trends) is an important and challenging research problem in many science and engineering applications. The information quotient of large-scale high-dimensional datasets is significantly reduced by the curse of dimensionality which makes the traditional clustering and association...
متن کاملLongitudinal High-Dimensional Data Analysis
We develop a flexible framework for modeling high-dimensional functional and imaging data observed longitudinally. The approach decomposes the observed variability of high-dimensional observations measured at multiple visits into three additive components: a subject-specific functional random intercept that quantifies the cross-sectional variability, a subject-specific functional slope that qua...
متن کاملJoining Massive High-Dimensional Datasets
We consider the problem of joining massive datasets. We propose two techniques for minimizing disk I/O cost of join operations for both spatial and sequence data. Our techniques optimize the available buffer space using a global view of the datasets. We build a boolean matrix on the pages of the given datasets using a lower bounding distance predictor. The marked entries of this matrix represen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Korean Journal of Applied Statistics
سال: 2016
ISSN: 1225-066X
DOI: 10.5351/kjas.2016.29.6.999